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Abstract 

We study maximum likelihood estimation for the statistical model for both directed and undirected 
random graph models in which the degree sequences are minimal sufficient statistics. In the undirected 
case, the model is known as the beta model. We derive necessary and sufficient conditions for the existence 
of the MLE that are based on the polytope of degree sequences. We characterize in a combinatorial fashion 
sample points leading to a nonexistent MLE, and non-estimability of the probability parameters under a 
nonexistent MLE. We formulate conditions that guarantee that the MLE exists with probability tending to 
one as the number nodes increases. We illustrate our approach on other random graph models for networks, 
such as the Rasch model, the Bradley-Terry model and the more general pi model of Holland and Leinhardt 
(1981). 

Kejwords: beta model, polytope of degree sequences, random graphs, Rasch model, pi model 

1 Introduction 

Many statistical models for the representation and analysis of network data rely on information contained 
in the degree sequence, the vector of node degrees of the observed graph. Node degrees not only quantify 
the overall connectivity of the network, but also reveal other potentially more refined features of inter- 
est. The study of the degree sequences and, in particular, of the the degree distributions of real networks 
is a classic topic in network analysis, which has received extensive treatment in the statistical literature 
(see, e.g., Holland and Leinhardt, 1981; Fienberg and Wasserman, 1981a; Fienberg et al., 1985), the physics 
literature (see, e.g., Newman et al., 2001; Albert and Barabasi, 2002; Newman, 2003; Park and Newman, 
2004; Newman et al., 2006; Foster et al., 2007; Willinger et al., 2009) as well as in the social network liter- 
ature (see, e.g., Robins et al., 2008; Goodreau, 2007; Handcockand Morris, 2007, and references therein). 
See also the monograph by Goldenberg et al. (2010) and the books by Kolaczyk (2009), Cohen and Havlin 
(2010) and Neman (2010). 
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The simplest instance of a statistical network model based exclusively on the node degrees is the ex- 
ponential family of probability distributions for undirected random graphs with the degree sequence as its 
natural sufficient statistic. This is in fact a simpler, undirected version of the broader class of statistical mod- 
els for directed networks known as the pi-models, introduced by Holland and Leinhardt (1981). We will 
refer to this model as the beta model (henceforth the /3-model), a name recently coined by Chatterjee et al. 
(2011), and refer to Blitzstein and Diaconis (2009) for details and extensive references. 

Despite its apparent simplicity and popularity, the /?-model, much like most network models, exhibit 
non-standard statistical features, since its complexity, measured by the dimension of the parameter space, 
increases with the size of the graph. Lauritzen (2003, 2008) characterized /3-models as the natural models for 
representing exchangeable binary arrays that are weakly summarized, i.e., random arrays whose distribution 
only depends on the row and column totals. More recently, Chatterjee et al. (2011) conducted an analysis of 
the asymptotic properties of the /?-model, including existence and consistency of the maximum likelihood es- 
timator (MLE) as the dimension of the network increases, and provided a simple algorithm for estimating the 
natural parameters. They also characterized the graph limits, or graphons (see Lovasz and Szegedy, 2006), 
corresponding to a sequence of /3-models with given degree sequences (for a connection between the theory 
of graphons and exchangeable arrays see Diaconis and Janson, 2007) . Concurrently, Barvinok and Hartigan 
(2010) explored the asymptotic behavior of sequences of random graphs with given degree sequences, and 
studied a different mode of stochastic convergence. Among other things, they show that, as the size of the 
network increases and under a "tameness" condition, the number of edges of a uniform graph with given 
degree sequence converges in probability to the number of edges of a random graph drawn from a /3-model 
parametrized by the MLE corresponding to degree sequence. 

Subsequently to the submission of our article, Yan and Xu (2012) and Yan et al. (2012) derived asymp- 
totic conditions for uniform consistency and asymptotic normality of the MLE of the /3-model, and asymptotic 
normality of the likelihood ratio test for homogeneity of the model parameters. Perry and Wolfe (2012) con- 
sider a general class of models for network data parametrized by node-specific parameters, of which the 
/?-model is a special case. The authors show that, under suitable conditions, the MLEs of model parameters 
exist and can be well approximated by simple estimators. 

In an attempt to avoid the reliance on asymptotic methods, whose applicability to network models re- 
mains largely unclear (see, e.g., Haberman, 1981), several researchers have turned to exact inference for the 
/3-model, which hinges upon the non-trivial task of sampling from the set of graphs with a given degree se- 
quence. Blitzstein and Diaconis (2009) developed and analyzed a sequential importance sampling algorithm 
for generating a random graph with the prescribed degree sequence (see also Viger and Latapay, 2005, for a 
different algorithm). Hara and Takemura (2010) and Ogawa et al. (2011) tackled the same task using more 
abstract algebraic methods and Petrovic et al. (2010) studied Markov bases for the more general pi model. 

In this article we study the existence of the MLE for the parameters of the beta model under a more 
general sampling scheme in which each edge is observed a fixed number of times (instead of just once, like 
in previous works) and for increasing network sizes. The reasons of our focus on the issue of existence 
of the MLE, which we view as a natural measure of the intrinsic statistical difficulty of the beta model, is 
twofold. First, existence of the MLE is a natural minimum requirement for feasibility of statistical inference 
in discrete exponential families, such as the beta model: nonexistence of the MLE is in fact equivalent to 
non-estimability of the model parameters, as illustrated in Fienberg and Rinaldo (2011). Thus, establishing 
conditions for existence of the MLE amounts to specifying the conditions under which statistical inference for 
these models is fully possible. Secondly, under the asymptotic scenario of growing network sizes, existence 
of the MLE will provide a natural measure of sample complexity of the beta model, and will indicate the 
asymptotic scaling of the model parameters for which statistical inference is viable. In fact, our results from 
Section 4 will prove that the parameters of the beta model can be estimated consistently even when the 
edge probabilities approach or 1, provided the network size is sufficiently large and under appropriate 
conditions. 

Though prior studies of the beta model by Chatterjee et al. (2011) and Barvinok and Hartigan (2010)^ 

^In the analysis of Barvinok and Hartigan (2010), the maximum entropy matrix associated to a degree sequence is in fact exactly 
the MLE corresponding to the observed degree sequence. This is a well-known propertie of linear exponential families: see, e.g.. 
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also revolve around the very same issue of existence of the MLE, our method of analysis is significantly 
different from existing contributions in that it is rooted in the statistical theory of discrete linear exponential 
families and relies in a fundamental way on the geometric properties of these families (see, in particular, 
Rinaldo et al., 2009; Geyer, 2009). Our contributions are as follows. 

• We provide explicit necessary and sufficient conditions for existence of the MLE for the beta model that 
are based on the pol5^ope of degree sequences, a well-studied polytope arising in the study of thresh- 
old graphs (see Mahadev and Peled, 1996). In contrast, the conditions of Chatterjee et al. (2011) 
are only sufficient. We then show that non-existence of the MLE is brought on by certain forbidden 
patterns of extremal network configurations, which we fully characterize in a combinatorial way. Fur- 
thermore, when the MLE does not exist, we can identify exactly which probability parameters are 
estimable. To illustrate our findings, we rely on the computational geometry software polymake (see 
Gawrilow and Joswig, 2000) to compute the forbidden configurations leading to nonexistence of the 
MLE for some simple beta models. 

• We use the properties of the polytope of degree sequences to formulate geometric conditions that 
allow us to derive finite sample bounds on the probability that the MLE does not exist. In particular, 
our results imply that the MLE exists with overwhelming probability even when the edge probabilities 
tend to zero or one as the network grows, a case unaccounted for by existing literature. Our asymptotic 
results improve analogous results of Chatterjee et al. (2011) and our proof is both simpler and more 
direct. Furthermore, we show that the tameness condition of Barvinok and Hartigan (2010) is stronger 
than our conditions for existence of the MLE. 

• Our analysis is not specific to the beta model but, in fact, follows a principled way for detecting nonex- 
istence of the MLE and identifying non-estimable parameters that is based on polyhedral geometry and 
applies more generally to discrete models. We illustrate this point by analyzing other network models 
that are variations or generalizations of the beta model: the Rasch model, the Bradley-Terry model and 
the pi model. 

Finally, we remark that our results arise as non-trivial applications of the geometric and combinatorial 
properties of log-linear models under general sampling schemes, as thoroughly described in the companion 
paper Fienberg and Rinaldo (2011), to which the reader is referred for further details as well as for practical 
algorithms. 

The paper is organized as follows. In section 2 we introduce a generalized version of the beta model in 
which we observe the edges of a graph a fixed number of times, possibly larger than one, and we express 
it as a natural exponential family with linear sufficient statistics. We obtain the beta model as a special 
case in which we observe edges only once. In section 3 we introduce the polytope of degree sequences and 
use it to derive necessary and sufficient conditions for the existence of the MLE. In particular, we charac- 
terize the patterns of edge counts for which the MLE does not exist, called co-facial sets. In section 3.1 we 
show a number of examples of co-facial sets, obtained using polymake. Furthermore, we use a result from 
Mahadev and Peled (1996) to show in section 3.2 how to construct virtually any example of random graphs 
for which the MLE of the beta parameters does not exist. In section 4 we once again use the polj^tope of 
degree sequences to obtain finite sample bounds on the probability that the MLE does not exist. As the 
number of objects to be compared increases, the MLE exists with probability approaching one. In section 
5 and in the appendix we describe an algorithm for computing and identifying facial sets. In section 6, we 
apply our theory and algorithm to a variety of other, related models: the Rasch model, a generalized beta 
model with no sampling restriction on the number of observed edges, the Bradley-Terry model and the pi 
model for directed networks of Holland and Leinhardt (1981). 

Cover and Thomas (1991, Chapter 11) 
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Notation 



For vectors x and y in the Euclidean space M", we will denote with Xi the value of x at its i-th coordinate and 
with (x, y) := x^y = ^ ■ x^j/i their standard inner product. Operations on vectors will be performed element- 
wise. For a matrix A, convhull(A) and conc(A) denote the set of all convex and conic combinations of the 
columns of A, respectively. For a polyhedron P, we denote with ri(P) its relative interior We will assume 
throughout some familiarity with basic concepts from polyhedral geometry (see, e.g., Schrijver, 1998) and 
the theory of exponential families (see, e.g., Barndorff-Nielsen, 1978; Brown, 1986). 

2 The (Generalized) Beta Model 

In this section we describe a simple generalization of the beta model and introduce the exponential family 
parametrization we will be using throughout the article. Though our analysis applies to the generalized beta 
model and recovers the original beta model as described in Chatterjee et al. (2011) as a special case, for 
simplicity and with slight abuse of notation, we will refer to our more general setting as the beta model as 
well. 

The beta model is concerned with the occurrence of edges in a simple undirected random graph, with 
the nodes labeled {1, . . . , n} for convenience. The associated statistical experiment consists of recording, for 
each pair of nodes with i < j, the number of edges appearing in Ni_j distinct observations, where the 
integers {Nij,i < j} are deterministic and positive (both the non-randomness and positivity assumptions 
can in fact be relaxed). For i < j, we denote with x.i_j, the number of times the edge was observed and, 
accordingly, with Xj^t the number of times object edge was missing. Thus, for all 

This is the natural heterogenous version of the well-known Erdos-Renyi random graph model (Erdos and Renyi, 
1959). For a discussion of this model and its generalizations see Goldenberg et al. (2010). The observed 
edge counts {xi_j,i < j} are modeled as draws from mutually independent binomial distributions, with 
Xi^j ^ Bm{Nij,pij), where pij G (0, 1) for each i < j. Accordingly, xj^i = Nij — Xi^ has a Bm{Nij,pj^i) 
distribution, where pj^i = 1 — pij. 

Data arising from such an experiment can be naturally represented through a n x n contingency table 
with empty diagonal cells and whose (i, j)-th cell contains the count Xi^, i ^ j. For modeling purposes, 
however, it is enough to consider the upper-triangular part of this contingency table. Indeed, since, given 
Xij, the value of xj^ is determined by Nij — Xij, the set of all possible outcomes can be represented more 

parsimoniously as the following subset of N^^): 

Sn ■= {xi.j ■■ i < j and Xij e {0, 1, . . . , N.^^}} . 

Throughout the article, we index the coordinates {(z,j): i < of any point x in the sample space Sn 
lexicographically. 

In the beta model, the edge probabilities are parametrized by points f3 £ R" as follows. For each 
/? e M", the probability parameters are uniquely determined as 

or, equivalently, in term of odds ratios, 

log^==A+/3„Vi^j. (2) 
Therefore, for a given choice of /3, the probability of observing the vector of edge counts x s 5„ is 

... / 
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with the probabiUty values pi j satisfying (1). Simple algebra shows that this expression can be written in 
exponential family form as 

L 4=1 ) i<j ^ ' 

where the coordinates of the vector of minimal sufficient statistics d — d{x) e N" are 

di ^^Xj^i +^Xij, i = l,...,n, (5) 

j<i j>i 

and the log-partition function : R" — > K is given hy P ^ Y,i<j N,^j log (1 + e^'+P') . Note that e'^^'^) < oo 
for all /3 e R", so R" is the natural parameter space of the full and steep exponential family with support 5„ 
(see, e.g. Barndorff-Nielsen, 1978) and densities given by the exponential term in (4). We take note that in 
the beta model parametrization the probability of an undirected simple graph with possibly multiple edges 
is fully determined only by the n natural parameters in (3 instead of the (2) edge probability parameters 
{Pt,j,-i < j}- 



Random graphs with fixed degree sequence 

In the special case in which Ni,j = 1 for all the support Sn reduces to the set Qn ■= {0, l}^^), which 
encodes all undirected simple graphs on n nodes: for any x € Qn, the corresponding graph has an edge 
between nodes i and j, with i < j, if and only if Xij = 1. In this case the beta model yields a class of 
distributions for random undirected simple graphs on n nodes, where the edges are mutually independent 
Bernoulli random variables with probabilities of success {pi_j,i < j} satisfying (1). Then, by (5), the i-th 
minimal sufficient statistic di is the degree of node i, i.e. the number of nodes adjacent to i, and the vector 
d{x) of sufficient statistics is the degree sequence of the observed graph x. This precisely the version of the 
beta model studied by Chatterjee et al. (2011). 



The Rasch model 

The Rasch model (see, e.g., Rasch, 1960; Andersen, 1980) is concerned with modeling the joint probabilities 
that k subjects provide correct answers to a set of I items, and is one of the most popular statistical models 
used in item response theory and in educational tests. This model can be recast as a random bipartite graph 
model in which, without loss of generality, the bipartition of the nodes consists of the sets / := {1, . . . , k} 
and J := {fc + 1, n — 1, n}, with k > 2 and I n — k > 2. The set / represents the subjects and the set J 
the items, and edges can only be of the form (i, j), with i g / and j G J. In particular, the presence of an 
edge indicates that the ith subject has responded correctly to the jth item. The sample space is given 
by the set TZn = {0, l}*^', and the vector x e {xij,i £ I,j £ J} £ TZ„ encodes the bipartite graphs in which 
the edge (i, j) is present if and only if Xi_j = 1, i.e. if and only if subject i answered correctly to item j. 

The Rasch model (see, e.g. Rasch, 1960) is then formulated by assuming that the edge probabilities 
satisfy equation (1), for some /3 e R". Then, it follows directly from on our discussion above that the Rasch 
model is a beta model for bipartite graphs, and, in particular, that the degree sequence provides the sufficient 
statistics. 



3 Existence of the MLE for the Beta Model 

In this section we derive a necessary and sufficient condition for the existence of the MLE of the natural 
parameter /3 e R" of the beta model or, equivalently of the probability parameters {pi.j,i < j} as defined in 
(1). For a given x € Sn, we say that the MLE does not exist when 

{P* ■■ Pi3'{x) = sup pp{x)} 0, 
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where p^(a:) is given in (4). Notice that nonexistence of the MLE entrails, in the case of the natural parame- 
ters, that the supremum of the likelihood function (4) cannot be attained by any finite vector in M", and, in 
the case of the probability parameters, that the supremum of (3) cannot be attained by any set of probability 
values bounded away from and 1, and satisfying the equations (1). 

We will formulate conditions for the existence of the MLE for the beta model based on a geometric object 
that will play a key role throughout the rest of the paper: the polytope of degree sequences. To this end, note 
that, for each x <E Sn, the vector of sufficient statistics d{x) for the beta model can be obtained as 

d{x) = Ax 

where A is the n x (2) design matrix consisting of the node-edge incidence matrix of a complete graph on 
71 nodes. Specifically, the rows of A are indexed by the node labels i G {1, . . . ,n}, and the columns are 
indexed by the set of all pairs with i < j, ordered lexicographically The entries of A are ones along the 
coordinates (i, when i < j and (i, (j, i)) when j < i, and zeros otherwise. For instance, when n = 4 

" 1 1 1 ' 
10 110 
10 10 1 ' 
10 11 

where the columns are indexed lexicographically by the pairs (1,2), (1,3), (1,4), (2,3), (2,4), and (3,4). 
In particular, as pointed out above, for any undirected simple graph x G Qn, Ax is the associated degree 
sequence. The polytope of degree sequences P„ is the convex hull of all possible degree sequences, i.e. 

Pn :— convhull {{Ax, x G Gn}) ■ 

The integral polj^ope F„ is a well-studied object: see Chapter 3 in Mahadev and Peled (1996). In the lan- 
guage of algebraic statistics, P„ is called the model polytope (see Sturmfels and Welker, 2011). In particular, 
when n = 2, P„ is just a line segment in IR-^ connecting the points (0, 0) and (1,1), while, for all n > 3, 

dim(P„) = n. 

The main result in this section is to show that existence of the MLE for the beta model can be fully 
characterized using the polj^ope of degree sequences in the following fashion. For any x £ Sn, let 



and set d = d{x) G M" to be the vector with coordinates 

di -^^PjA + ^Pi,3, i^l,---,n. (6) 

j<i 3>i 

Notice that, d is a just a rescaled version of the sufficient statistics (5), normalized by the number of obser- 
vations. It is also clear that, for the random graph model, d = d. 

Theorem 3.1. Let x G 5„ he the observed vector of edge counts. The MLE exists if and only if d{x) G int(P„). 
Remark 

Theorem 3.1 verifies the conjecture contained in Addenda A in Chatterjee et al. (2011): for the random 
graph model, the MLE exists if and only if the degree sequence belongs to the interior of Pn- This result 
follows from the standard properties of exponential families: see Theorem 9.13 in Barndorff-Nielsen (1978) 
or Theorem 5.5 in Brown (1986). The theorem also confirms the observation made by Chatterjee et al. 
(2011) that the MLE never exists if n = 3: indeed, since P3 has exactly 8 vertices, as many as the possible 
graphs on 3 nodes, no degree sequence can be inside P3. 

The geometric nature of Theorem 3.1 has important consequences. First, it provides the algorithmic basis 
for detecting existence of the MLE, as discussed in the appendix. Secondly, and quite importantly, it allows 
to indentify the patterns of observed edge counts that cause nonexistence of the MLE, i.e. the sample points 
for which the MLE is undefined. This is done in the next result. 
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Table 1 : Example of a co-facial set leading to a nonexistent MLE. 

Lemma 3.2. Apointy belongs to the interior of some face F of Pn if and only if there exists a set F a {{i,j),i < 
j} such that 

V = Ap, (7) 

where p = {pi^ : i < j,pij e [0, 1]} G is such that e {0, 1} if{i,j) ^ T andptj e (0, 1) if e J". 
The set T is uniquely determined by the face F and is the maximal set for which (7) holds. 

Following Geiger et al. (2006) and Fienberg and Rinaldo (2011), we call any such set J" a facial set of Sn 
and its complement, J"^ = {{hi) ■ ^ < j} ^ co-facial set. Facial sets form a lattice that is isomorphic to 
the face lattice of P„ as shown by Fienberg and Rinaldo (2011, Lemma 3.4). This means that the faces of Sn 
are in one-to-one correspondence with the facial sets of Sn and, for any pair of faces F and F' of Sn with 
associated facial sets F and F', F n F' if and only if J" n J"' = and F c i^' = if and only if F c F'. In 
particular, for a point x e 5„, d{x) = Ax belongs to the interior of a face F of Pn if and only if there exists a 
non-negative p such that d{x) ~ kp, where F = {(«, j) : Pij > 0} is the facial set corresponding to F. By the 
same token, y e int(P,i) if and only ify — Kp for a vector p whose coordinates are strictly between and 1. 

Facial sets are combinatorial objects that have statistical relevance for two reasons. First, non-existence 
of the MLE can be described combinatorially in terms of co-facial sets, i.e. patterns of edge counts that are 
either or A^i.j. In particular, the MLE does not exist if and only if the set {(i, j) : i < j, Xij = or Nij} 
contains a co-facial set. Secondly, apart from exhausting all possible patterns of forbidden entries in the 
table leading to a nonexistent MLE, facial sets specify which probability parameters are estimable. In fact, 
inspection of the likelihood function (3) reveals that, for any observable set of counts {xij : i < j}, there 
always exists a unique set of maximizers p ~ {p,; j , i < j } which, by strict concavity, are uniquely determined 
by the first order optimality conditions 

d(x) = Ap, 

also known as the moment equations. Existence of the MLE is then equivalent to < p^.j < 1 for all i < j. 
When the MLE does not exist, i.e. when d is on the boundary of P„, the moment equations still hold, but 
the entries of the optimizer {pij ,i < j}, known as the extended MLE, are no longer strictly between and 1. 
Instead, by Lemma (3.2), the extended MLE is such thatpi = pij e {0, 1} for all (i, j) e F'^. Furthermore, 
it is possible to show (see, e.g., Morton, 2008) thatp^.j e (0, 1) for all e F. Therefore, when the MLE 
does not exist, only the probabilities {pij, € F} are estimable. 

Therefore, while co-facial sets encode the patterns of table entries leading to a non-existent MLE, facial 
sets indicate which probability parameters are estimable. A similar, though more involved interpretation 
holds for the estimability of the natural parameters, for which the reader is referred to Fienberg and Rinaldo 
(2011). 

Below, we further investigate the properties of P„ and provide several examples of co-facial sets associ- 
ated to the facets of P„. 

3.1 The Co-facial Sets of P„ 

Theorem 3.1 and Lemma 3.2 both show that the boundary of the polj^ope P„ plays a fundamental role in 
determining the existence of the MLE for beta models and in specifying which parameters are estimable. 

Mahadev and Peled (1996) have fully characterized the boundary of P„ and derived the facet-defining 
inequalities of P„, for all ti > 4 (when ?i < 3 the problem is of little interest). For the reader's convenience. 
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Table 2: Left: data exhibiting the pattern reported in Table 1, when iV^j = 3 for all i ^ j. Right: table 
of the extended MLE of the estimated probabilities. Under the natural parametrization, the supremum 
of the log-likelihood is achieved in the limit for any sequence of natural parameters {Z?^*^^} of the form 
pik) = (^_(,^^ -ck,ck,ck), where oo as A: oo. 



X 2 1 2_ X 0.225 0.384 0.725 

^^^^ 0.775 X 0.225 0.551 

2 3 X 3_ 0.616 0.775 x 0.725 

1 2 >r 0.275 0449 0275 x~ 



Table 3: Left: same data as in Table 2, but with the values for the cells (1, 2) and (2, 3) switched with the 
values in the cells (2,1) and (3,2), respectively. Right: table of probabilities at which the log-likelihood is 
optimal. The MLE of the natural parameters are /3 = (-0.237,-1.002, -0.237, 1.205). 

we report this result below. Let V be the set of all pairs {S, T) of disjoint non-empty subsets of {1, . . . , n}, 
such that |5 U r| G {2, . . . , n - 3, n}. For any (S*, T) e P and y e P„, let 

.9(5, T, y, n) \S\{n - 1 - |T|) - ^ y, + ^ y,. (8) 

Theorem 3.3 (Theorem 3.3.17 in Mahadevand Peled (1996)). Let n > i and y e Pn- The facet-defining 
inequalities of P„ are 

(i) Vi > O,fori = l,...,ri; 

(ii) yi<n-l,fori = l,..., n; 

(Hi) giS, T, y, n) > 0, for all (S, T) e V. 

The combinatorial complexity of the face lattice of an n-dimensional polytope can be summarized by its 
/-vector, the vector of length n + 1 whose i-th entry contains the number of i-dimensional faces, i = 0, . . . , n. 
Stanley (1991) studies the number faces of the pol5^ope of degree sequences P„ and derives an expression 
for computing the entries of the /-vector of Pn. For example, the /-vector of Pg is the 9-dimensional vector 

(334982, 1726648, 3529344, 3679872, 2074660, 610288, 81144, 3322, 1), 

so Fg is an 8-dimensional polj^ope with 334982 vertices, 1726648 edges, and so on, up to 3322 facets. Also, 
according to Stanley's formula, the number of facets of F4, P5, Pg and P-j are 22, 60, 224 and 882, respectively. 
These numbers correspond to the numbers we obtained with polymake, using the methods described in the 
appendix. 

Despite the fact that much is known about P„, the number of facet-defining inequalities appears to be 
exponential in n and, consequently, the tasks of identifying points on the boundary of P„ and the associated 
facial set remain computationally challenging. In the appendix, we discuss these difficulties and propose an 
algorithm for detecting boundary points and the associated facial sets that is based on a log-linear model 
reparametrization. Using the methods described there, we were able to identify a few interesting cases in 
which the MLE is nonexistent, most of which seem to be unaccounted for in the statistical literature. Below 
we describe some of our computations. 
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Table 4: Example of a co-facial set leading to a nonexistent MLE. In this case o?2 = 0. 
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Table 5: Example of a co-facial set leading to a nonexistent MLE. In this case the second row sum is 0. 
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Table 6: Example of a co-facial set leading to a nonexistent MLE. 

Recall that the data can be represented as a n x n table of counts, in which the diagonal elements are 
expunged and where the — th entry of the table indicates the number of times, out of N.i_j, in which 
the edges was observed. In our examples, empty cells correspond to facial set and may contain any 
count values, in contrast to the cells in the co-facial sets that contain either a zero value or a maximal value, 
namely Nij. As we say in Lemma 3.2, extreme count values of this nature are precisely what leads to a 
nonexistent MLE. 

Table 1 provides an instance of a co-facial set, which corresponds to a facet of P4. Assume for simplicity 
that each of the empty cells contain counts bounded away from and Nij. Then the sufficient statistics d 
are also bounded away from and ?i — 1 and, and so are the row and column sums of the normalized counts 
: i ^ j}, yet the MLE does not exist. This is further illustrated in Table 2, which shows, on the left, 
an instance of data with Ni,j = 3 for all i ^ j, satisfying the pattern indicated in Table 1 and, on the right, 
the probability values maximizing the log-likelihood function. Since the MLE does not exist, some of these 
probability values are and 1. The order of the pattern is crucial. Indeed, Table 3 shows, on the left, data 
containing precisely the same counts as in Table 2, but with the values in cells (1,2) and (2, 3) switched with 
the values in cell (2, 1) and (3, 2), respectively. On the left of Table 3 the MLE of the cell probabilities are 
shown; as the MLE exists, they are bounded away from and 1. 

In Table 4 we show another example of a co-facial set that is easy to detect, since it corresponds to a value 
of for the normalized sufficient statistic ^2- Indeed, from cases (i) and (ii) of Theorem 3.3, the MLE does 
not exist if c?i = or di = n — 1, for some i. Table 5 shows yet one more example of a co-facial set that is easy 
to detect, as it leads to a zero row margin for the second row. Finally, Table 6 provides one more example 
of a co-facial set, which unlike the ones in Tables 4 and 5, has normalized row sums and the normalized 
sufficient statistics bounded away from and n — 1. In Table 7 we list all 22 co-facial sets associated with 
the facets of P„, including the cases already shown in Tables 1, 4, 5 and 6. 

In general, there are 2n facets of P„ that are determined by di equal to or n — 1, and In other facets 
associated to values of the normalized row sums equal to or n — 1. Thus, just by inspecting the row sums or 
the observed sufficient statistics, one can detect 4n co-facial sets associated to as many facets of P„. However, 
comparing this number to the entries of the /-vector calculated in Stanley (1991) and as our computations 
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Table 7: All possible co-facial sets for P4 (empty cells indicate any entry values) . 



confirm, most of the facets of P„ do not yield co-facial sets of this form. Since the number of facets appear 
to grow exponentially in n, we conclude that most of the co-facial sets do not appear to arise in this fashion. 

3.2 Random Graphs with Nonexistent MLEs 

when dealing with the special case of Nt.j = 1 for all i < j, which we showed to be equivalent to a model for 
random undirected graphs, points on the boundary of Pn are, by construction, degree sequences and have a 
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Table 8: Patterns of zeros and ones yielding random graphs with non-existent MLE (empty cells indicate that 
the entry could be a or a 1). 



direct graph-theoretical interpretation, as shown in the next result. 

Lemma 3.4 (Lemma 3.3.13 in Mahadev and Peled (1996)). Let dbe a degree sequence of a graph Q that lies 
on the boundary of Pn- Then either di = 0, or di = n — 1 for some i, or there exist non-empty and disjoint 
subsets S and T of {1, . . . ,n} such that 

1. S is clique of Q; 

2. T isa stable set of G; 

3. every vertex in S is adjacent to every vertex in {S U T)^ in Q; 

4. no vertex ofTis adjacent to any vertex of {S U T)^ in Q. 

A direct consequence of lemma 3.4 is that the MLE does not exists if the observed network is a split graph, 
i.e. a graph whose node sets can be partitioned into a clique S and a stable set T. More generally, Lemma 
3.4 can be used to create virtually any example of random graphs with fixed degree sequences for which the 
MLE does not exist. Notice that, in particular, having node degrees bounded away from and 71 — 1 is not a 
sufficient condition for the existence of the MLE (though its violation implies nonexistence of the MLE) . We 
point out that, in order to detect boundary points and the associated co-facial sets. Lemma 3.4 is, however, 
of little help. 

Below, we provide some examples of co-facial sets for random graphs with fixed degree sequences for 
which the MLE does not exist, yet the node degrees are bounded from and n — 1. 

For the case n = 4, our computations show that there are 14 distinct co-facial sets associated to the facets 
of Pn- Eight of them correspond to degree sequences containing a or a 3, and the remaining six are shown 
in Table 8, which we computed numerically using the procedure described in the appendix. Notice that 
the three tables on the second row are obtained from the first three tables by switching zeros with ones. 
Furthermore, the number of the co-facial sets we found is smaller than the number of facets of Pn, which is 
22, as shown in Table 7. This is a consequence of the fact that the only observed counts in the random graph 
model are O's or I's: it is in fact easy to see in Table 7 that any co-facial set containing three zero counts 
and three maximal counts Nij is equivalent, in the random graph case, to a node having degree zero or 3. 
However, as soon as Nij > 2, the number of possible co-facial sets matches the number of faces of P„. 

Table 9 shows an observed graph with degrees all larger than and less than 3 but for which the MLE 
does not exist. Notice that the co-facial set corresponds to the one shown in the upper left corner of Table 8. 
Finally, Tables 9 and 10 show two more examples of random graphs on n = 5 and n = 6 nodes, respectively, 
for which the MLE does not exist (by Lemma 3.4), and yet the degrees are such that < di < n — 1 for all i. 
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Table 9: Random graph with node degrees larger than and smaller than 3 exhibiting the same co-facial set 
show in the upper left corner of Table 8. In this case, lemma 3.4 applies with S = {3, 4} and T ~ {1,2}. 
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Table 10: Network with n = 5 for which the MLE does not exist and the degrees are bounded away from 
and 4. In this case, lemma 3.4 applies with S — {2, 3, 4} and T = {1, 5}. 
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Table 1 1 : Network with n = 6 for which the MLE does not exist and the degrees are bounded away from 
and 5. In this case, lemma 3.4 applies with 5 = {1, 2, 6} and T = {3, 4, 5}. 

4 Existence of the MLE: Asymptotics 

In this section we derive sufficient conditions that imply existence of the MLE with large probability as the 
size of the network n grows. We will make the simplifying assumption that Ni_j — Nn, for all i and j, where 
Nn > 1 could itself depend on n. 

Recall the random vector d, whose coordinate are given in (6) and set d = E[d] e M". Then 

di ^^P].i +^Pi.j, i^l...,n. 

We formulate sufficient conditions for the existence of the MLE in terms of the entries of the vector d. 
Theorem 4.1. Assume that, for all n > max{4, 2y^c "'^'^ + 1}, the vector d satisfies the conditions 

(i) min, min {d„n ~ 1 -d,} > 2^ c^^^ + C, 

(ii) mm^s.T)ev9{S,T,d,n) > \SUT\^c^ + C, 

where c> 1/2 and C <E ^0, ^ - y^c "'^^" ^ . Then, with probability at least 1 - the MLE exists. 

When Nn is constant, for instance when Nn = 1, as in the random graph case, the conditions of Theo- 
rem 4.1 can be relaxed by requiring condition (ii) to hold only over subsets S and T of cardinality of order 
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^{y/nTogn). While we present this result in greater generality by assuming only n > Nn, we do not expect 
it to be sharp in general when iV„ grows with n. 

Corollary 4.2. Let n > niax{A^, 4, logn + 1}, c > I and C £ (O, — \Jcn log n) . Assume the vector 
d = E[d] G M" satisfies the conditions 

(i') mini min {di, n — 1 — d^} > 2^/cr^\ogn + C; 
(iV) mm(^s T)ev„ 9{S, T, d, n) > jS" U T|\/cnlogn + C, 

where 

Vn ■■= {{S,T) e V: mm{\S\, \T\} > ^cnlogn + C}, 

where the set V was defined before Theorem 3.3. Then, the MLE exists with probability at least 1 — ni^-i ■ If 
N = 1, it is enough to have c> 1/2, and the MLE exists with probaiblity larger than 1 — 

4.1 Discussion and Comparisons with Previous Works 

It is clear that, asymptotically, the value of the constant C in both Theorem 4.1 and Corollary 4.2 becomes 
irrelevant, as the constraints on its range will be satisfied by any positive C, for all ?i large enough. 

Since |5 U T| < n, one could replace assumption (ii) of Theorem 4.1 with the simpler but stronger 
condition 

min g(S, T, d, n) > r?^"^ J c logn + C„. 

Then, assuming for simplicity that Nn is a constant, as in Corollary 4.2, the MLE exists with probability 
tending to one at a rate that is polynomial in n whenever 

min min {di, n — 1 — di} = O (^\^ n log rij 

and, for all pairs (5, T) e V, 

g{S,T,d,n) > n (n^/Vlogn) . 

For the case Nn = 1, Corollary 4.2 should be compared with Theorem 3.1 in Chatterjee et al. (2011), 
which also provides sufficient conditions for the existence of the MLE with probability no smaller than 
1 — „:ie-i (for all 71 large enough), but appear to be stronger than ours. In detail, their conditions require 
that, for some constant ci, C2 and C3 in (0, 1), ci{n — 1) < < C2(n — 1) for all i and 

1^1(1^1 - l)-Y.d,+Y,^Hd^, \S\} > c^n\ (9) 
for all sets S such that |S'| > (ci)^n^. It is easy to see that, for any non-empty subsets S c {!,... ,n} and 

rc{i,...,7i}\5, 

Y,^Hd^,\S\} <Y.d^ + \S\\{SUTn 

which implies that 

\S\{n - 1 - |T|) - ^ d, + ^ d,; > \S\{\S\ -l)-J2'^'+J2 min{d,|5|}, 

where we have used the equality n = \S\ + \T\ + \{S U TY\. Thus if (9) holds for some non-empty 
S c {!,..., n}, it satisfies the facet conditions implied by all the pairs (S*, T), for any non-empty set 
rc{l,...,ri}\5. Asa result, for any subset S, (9) is a stronger condition than any of the facet conditions of 
Pn specified by 5*. In addition, we weakened significantly their requirements that ci(n — 1) < < €2(71 — 1) 
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for all i to mini min {d,, n — 1 — di} > 2y'cn \ogn + C. As a direct consequence of this weakening, in our 
analysis we only need |5| > v^cTilogri + C as opposed to \S\ > (ci)^n^. Overall, in our setting, the vector 
of expected degrees of the sequence of networks is allowed to lye much closer to the boundary of Pn ■ As 
we explain next, such weakening is significant, since the setting of Chatterjee et al. (2011) only allows to 
estimate an increasing number of probability parameter (the edge probabilities) that are uniformly bounded 
away from and 1, while our assumptions allow for these probabilities to become degenerate as the network 
size grows. 

The non-degenerate case 

We now briefly discuss the case of sequences of networks for which Nn ~ 1 and the edge probabilities are 
uniformly bounded away from and 1, i.e. 

,5 < p,,, < 1 - <5, Vz,j, (10) 

for some 8 e (0, 1) independent of n. In this scenario, the number of probability parameters to be estimated 
grows with n, but their values are guaranteed to be non-degenerate. It immediately follows from the non- 
degenerate assumption (10) that d G int(P„) and 

b(n-\) <'d,<(\- b)(n-\), i = l,...,n. (11) 

Then, the same arguments used in the proof of corollary 4.2 imply that the MLE exists with high probability. 
We only provide a sketch of the proof First, we note that, with high probability, g{S, T, J, n) > g{S, T, d, n) — 
\S U T|ri (V" log n) , for each pair {S, T) e V. Furthermore, because of (11), it is enough to consider only 
pairs (5, T) of disjoint subsets of {1, . . . , n} of sizes of order For each such pair, the condition on di 

further yields that g{S, T, d, n) is of order fi(n^), and, by Theorem 8 the MLE exists with high probability. 

In fact, the boundedness assumption of Chatterjee et al. (2011) that ||/3||oo < L, with L independent of 
n, is equivalent to the non-degenerate assumption (10), as it can be easily seen from equation (1). Unlike 
the analysis of Chatterjee et al. (2011), which focusses on the non-degenerate case, our results hold under 
weaker scaling, as we only require for instance that di be of order (y^nlogn) for all i. 

Finally, we note that the tameness condition of Barvinok and Hartigan (2010) is equivalent to 5 < pij < 
1 — (5 for all i and j and a fixed S e (0, 1), where pij is the MLE oi pij. Therefore, the tameness condition 
is stronger than existence of the MLE. In fact, using again Theorem 1.3 in Chatterjee et al. (2011), for all 
n sufficiently large, the tameness condition is equivalent to the boundedness condition of Chatterjee et al. 
(2011). 

We conclude this section with two final remarks. First, Theorem 1.3 in Chatterjee et al. (2011) shows 
that, when the MLE exists, max.; \l3i — (3i\ = 0(^logn/n), with probability at least 1 — ■ Combined 
with our Corollary 4.2, this implies that the MLE is a consistent estimator under a growing network size 
and with edge probabilities approaching the degenerate values of and 1. Secondly, after the submission 
of this article we learned about the interesting asymptotic results of Yan and Xu (2012); Yan et al. (2012), 
who claim that, based on a modification of the arguments of Chatterjee et al. (2011), it is possible to show 
the MLE of the /3-model exists and is uniformly consistent if L = o(logn) and L = o(loglogn), respectively, 
where L — max; 

5 Computations 

The main difficulty in applying the theory presented so far is that the polytope of degree sequences P„ is in 
general difficult to handle algorithmically. Indeed, Pn arises a Minkowksi sum and, even though the system of 
defining inequalities is given explicitly, its combinatorial complexity grows exponentially in n. Furthermore, 
the vertices of Pn are not known explicitly. Algorithms for obtaining the vertices of _P„, such as minksum (see 
Weibel, 2005), are computationally expensive and require generating all the points {Ax, a; e Qn}, where 
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\Qn\ = 2(2). In general, when n is as small as 10, this is not feasible. See for instance. Table 6.4 below. 
Thus, deciding whether a given degree sequence is a point in the interior of _P„ and identifying the facial set 
corresponding to an observed degree sequence on the boundary of P„. 

Our strategy to overcome these problems entails re-expressing the beta model as a log-linear model 
with (2) product-multinomial sampling constraints. This approach is not new, and it harks back to the 
earlier re-expression of the Holland-Leinhardt pi model and its natural generalizations as log-linear mod- 
els (Fienberg and Wasserman, 1981a,b; Fienberg et al., 1985). Though this re-parametrization increases the 
dimensionality of the problem, it nonetheless has the crucial computational advantage of reducing the deter- 
mination of the facial sets of Pn to the determination of the facial sets of a pointed polyhedral cone spanned 
by n(n — 1) vectors, which is a much simpler object to analyze, both theoretically and algorithmically. This 
procedure is known as the Cayley embedding in polyhedral geometry, and its use in the analysis of log-linear 
models is described in Fienberg and Rinaldo (2011). The advantages of this re-parametrization are two-fold. 
First, it allows us to use the highly optimized algorithms available in polymake for listing explicitly all the 
facial sets of P„. This is how we computed the facial sets in all the examples presented in this article. Sec- 
ondly, the general algorithms for detecting nonexistence of the MLE and identifying facial sets proposed in 
Fienberg and Rinaldo (2011), which can handle larger dimensional models, can be directly applied to this 
problem. This reference is also relevant for dealing with inference under a non-existent MLE. 

The appendix describes the details of our computations and the associated algorithms. 

6 Applications and Extensions 

The main arguments that we have used to explore nonexistence of the MLE and parameter estimability in 
the beta model are rather general, as they pertain to all log-linear models (see, e.g., Fienberg and Rinaldo, 
2011). In this section we extend them to different models for networks. 

6.1 The Rasch model 

Just like in Section 3.2, necessary and sufficient conditions for the existence of the MLE of the Rasch model 
parameters can also be formulated in geometric terms based on the polj^ope of degree sequences. In detail, 
for a bipartition of the n nodes of the form / = {1, . . . , fc} and J = {fc + 1, n — 1, n}, where / = n — k, 
let Pk.i C K" denote the associated polytope of bipartite degree sequences, i.e. the convex hull of all degree 
sequences of bipartite undirected simple graphs on n nodes, with the bipartition specified by / and J. Let d{x) 
denote the degree sequence associated with the observed bipartite graph x e TZn- Then, a straightforward 
apphcation of Theorem 9.13 in Barndorff-Nielsen (1978) yields the following result. 

Theorem 6.1. The MLE of the Rasch model parameters exists if and only if d{x) € ri(Pp,g). 

The polj^ope of bipartite degree sequences was introduced by Hammer et al. (1990). We briefly recall 
its properties (see Mahadev and Peled, 1996, Section 3.4 for more details). Let 

Fi,j ■.= {yePn: g{y,I,J,n)^0} 

be the facet of P„ specified by / and J, where g is given in (8) (the sets / and J can be interchanged). Also, 
let c e M" be the vector with coordinates 

J fc — 1 i = 1, . . . , k 
' 1 i = k + 1, . . . ,n. 

The polytope of bipartite degree sequences Pkj is just the translate by c of the facet F/,,/, which implies, 
in particular, that dim(Pp g) = n-1 (this explains why, in Theorem 6.1, we used the correct notation ri(P/_fc) 
instead of int(Pp,q)). 

Theorem 6.2 (Theorem 3.4.4 in Mahadev and Peled (1996)). Pkj ^ {y-c,y € Fij}. 
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The previous result is rather useful: in order to determine whether the MLE fails to exist, i.e. whether the 
degree sequence of the observed bipartite graph is on the relative boundary of Pk^i, one can use Lemma 3.4 
as follows. First add an edge between each pair of nodes in / (so, the graph is no longer bipartite). Then, 
check whether there is a pair of sets S and T, different from / and J, for which the conditions of Lemma 3.4 
apply. Thus, the MLE does not exists if and only if there exists a partition of the nodes into three non-empty 
sets S, T and {S U Ty, such that, with respect to this enlarged graph, 

1.5c/ (hence S is complete); 

2. T C J (hence, T is stable); 

3. every vertex of S is adjacent to every vertex in (S U T)^; 

4. no vertex in T is adjacent to any vertex in {S U T)"^. 

In fact, the above conditions are equivalent to the conditions for existence of the MLE in the Rasch model 
found independently by Haberman (1977) and Fischer (1981). Indeed, recall that Haberman's condition are 
as follows: the MLE does not exists if there there exists sets A, B , C and D such that 

1. AU B = I andCU D = J, with An B nC n D = 0; 

2. A 7^ and C 7^ or B 7^ and £> 7^ 0; 

3. Xi_j ~ for all i e A and i e C; 

4. J = 10 for all i e B and j e D, 

were x G 7^,i is the observed graph. Then, to see the equivalence, take S = B, T ^ C and {S U Ty = AU D. 



6.2 Removing the Sampling Constraint in the Beta Model 

In this section we analyze the behavior the generalized beta model when the number of recorded edges for 
each pair of nodes is also random. Specifically, we assume the number of observed edges {xij : i ^ j} are 
realizations of n{n — 1) independent Poisson random variables with means {m^j : i ^ j}. As a result, the 
quantities {Ni_j,i =^ j} are now random and can be zero with positive probabilities. Unlike the beta model 
described in Section 2, in this more general case Xj^i is not determined by Xi^, thus we need to account 
for all possible quantities {xij}i^j. We index the points of this enlarged set of 71(71 — 1) numbers as pairs 
{{xij,Xj_i) ■ i < j} C with the pairs ordered lexicographically based on 

In this setting, natural generalization of the beta model is to consider a parametrization of the mean edge 
counts by points a € M" and 7 e M" so that 

log rriij = ai+ 7^ , Vi ^ j. (12) 

Some algebra then shows that the probability of observing any point x e N"^"~^^ is 

= exp J ^ a,dr + E ^^4" - '^("' [ n (13) 

where the coordinates of the vectors of minimal sufficient statistics = d°"*(a;) and d™ = are 

ci°"* := E := ^ i = 1,.. .,n, 

respectively, and the log-partition function 0: R^" ^ R is given by (a, 7) ^ "l^i^j ^^p{<^i+lj}- The sufficient 
statistics d = d{x) can be obtained as 
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Table 12: Co-facial sets of the second kind, as specified in theorem 6.3, for the case n = A. Empty cells refer 
to arbitrary entries. 



where K is a 2ny. n{n — 1) matrix whose columns are indexed by the points in the sample space, and whose 
rows are indexed by the parameters {ai, . . . , a„, 71, . . . , 7„}. The entries of the row corresponding to are 
all zeros, except for the coordinates corresponding the columns with i < j and (j, i) with i > j, which 
are ones. Similarly, the rows corresponding to are all zeros, except for the coordinates corresponding the 
columns (j, i) with i < j and (i, j) with i > j, which are ones. For instance, when n = 4, 

' 
10 
10 
10 1 
0' 
10 
1 
10 10 

We remark that A is rank-deficient, as its rank is 2n — 1, which reflects the fact that the parametrization in 
(12) is non-identifiable (this can be easily fixed by imposing, for instance, the constraint J2i = 0). 

Notice that if the entries of x e {0, l}"("~i) are all zeros and ones, then x encodes a directed graph on 
71 nodes, with an arrow going from node i to node j if and only if Xij = 1 (thus, there may be two edges 
connecting any pair of nodes, directed in opposite ways) . In this case, the sufficient statistics and d™ 
correspond to the in-degrees and out-degrees of the nodes. 

Below we provide necessary and sufficient conditions for the existence of the MLE of (a, 7) or, equiva- 
lently, of {rriij : i ^ j} satisfying equation (12). To this end, let C„ denotes the polyhedral cone spanned by 
the columns of A. 

Theorem 6.3. Let x G R"'""^) be the vector of observed edge counts. Then, the MLE exists if d{x) £ int(CA). 
The polyhedral cone C„ has 3n facets. The co-facial sets corresponding to the facets of Cn can be classified as 
follows: 

1. the 2n support sets of the columns of A, each corresponding to a zero entry in the vectors of in-degree or 
out-degree statistics; 

2. n co-facial sets of the form ■ i ^ .j ^ k}, one for each k ~ 1, . . . , n. 

For instance, when n = 4, there are 12 facial sets, 8 of them associated to a zero value in the 8 dimensional 
vector of sufficient statistics. The remaining 4 co-facial sets are shown in Table 12. 

The previous Theorem implies that the number of facets of C„ grows only linearly in n, unlike the number 
of facets of the polj^ope of degree sequences P„ . Thus, for this model, nonexistence of the MLE is a much less 
frequent phenomenon, at least combinatorially. Note in particular, that the MLE exists even if Xi^j -\- xj^i = 
for some (in fact many) pairs. Theorem 6.3 can be used to easily show that the MLE exists with probability 
tending to one as n increases. Indeed, the probability of a nonexistent MLE is no larger than 

n n n 

e" + XI e~ + H . (14) 
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Then, letting ni* := miiii^j "i^j, the first two terms in equation (14) are each smaller than ne ^" , 
while the last term is bounded from above by 

where the last inequality is due to the fact that (2) — 2(n — 1) > ?i — 1 for all ?i > 7. Thus, (14) is bounded 
from above by Sne^^"^^^™*, which implies that, if m* = m*{n) = '^^''^ " , the MLE exists with probability at 
least 1 — This simple calculation then shows that the MLE exists with overwhelming probability even if 
the expected edge counts all tend to zero, as long as these values decay at a rate (^^^^^ ■ 

The results just obtained can be specialized to the Rasch model, in which the nodes are partitioned into 
two sets / and J of cardinality k and I = n — k, and edges can only occur between a node i <E I and a node 
j e J, though the number of edges between any pair of nodes ( j, j) is random. The observed set edge counts 
takes the form of a fc x / contingency table and the sufficient statistics are the k row sums and the / column 
sums. As noted by Haberman (1977), in this case the MLE exists if and only if the row and column sums are 
all positive. 



A simpler Poisson model 

When the Poisson model for directed graphs described in (12) is specialized to the case of undirected graph, 
we obtain a simpler model in which the number of edges between nodes i and j, with i ^ j, has a Poisson 
distribution with mean 

log mi J = 6i + 6j. 

The previous display should be compared to (2) . 

In this setting, the design matrix A is the same design matrix described in Section 3, of dimension 
n X (2) • The associated convex support is the pointed polyhedral cone spanned by its columns. The proof 
of Theorem 6.3 can then be easily adapted to derive the co-facial sets of C^, which we describe in the next 
result. 

Corollary 6.4. The cone C'^ has 2n facets. The corresponding facial sets are as follows: 

1. then support sets of the columns of A 

2. n co-facial sets of the form {(i, j): i ^ j ^ k,i < j}, one for each k = 1, . . . ,n. 

Following the same arguments above, we see that the MLE exists with probability tending to one as long 
as the expected edge counts are of order ( j . 



6.3 The Bradley-Terry Model 

We can specialize the model described in Section 6.2 to a directed graph without multiple edges, thus 
obtaining the Bradley-Terry model for pairwise comparisons. See Bradley and Terry (1952), David (1988), 
Hunter (2004) and references therein. In detail, let denote the probability of a directed edge from i to j 
and pj.i the probability of a directed edge from j to i. According to the Bradley-Terry model, the probabilities 
of directed edges can be parametrized by vectors /? G M" so that 

or, equivalently in terms of log-odd ratios, log = j3i — j3j, \/i < j. Notice that this parametrization is 

redundant, and identifiability is typically enforced by requiring that = 1- Data are obtained by 

recording, for each pair of nodes the outcomes of Nij pairwise comparisons, where Ni_j are fixed 
positive integers, resulting in Xi_j instances of node i being preferred to node j and Xj_i instances of node j 



18 



being preferred to node i, with Xi,j + xjj = Nij. The outcomes of the pairwise comparison are assumed 
mutually independent. Thus, for i < j, the Bradley-Terry model treats the n{n— 1) observed counts {xij : i ^ 
j} as a realization of mutually independent Bin(Afij,pi distributions, where the probability parameters 
{Vi.3 -i^ 3} satisfy (15), 

Despite the apparent similarity between equations (1) and (15), the beta model and the Bradley-Terry 
model are radically different. Indeed, for the Bradley-Terry model, it is well known that the minimal sufficient 
statistics are the row sums (or the column sums) of the observed table, which corresponds to the vector of 
our-degrees (or in-degrees, respectively) of the network. Indeed, this model can be alternatively prescribed 
as a model of quasi-s5mimetry and quasi-independence (see, e.g. Fienberg and Larntz, 1976). Necessary and 
sufficient conditions for the existence of the MLE are due to Zermelo (1929) and Ford (1957), and can be 
expressed in a graph theoretic form as follows: the MLE exists if and only if the observed directed graph is 
strongly connected, a property which we can easily check by a depth-first search. According to this condition, 
a simple calculation shows that the number of facial sets corresponding to the facets of the associated convex 
support (see, e.g., Barndorff-Nielsen, 1978) is 

n-l / ^ 

i=i ^ ' 

See Simons and Yao (1999) for an analysis of the existence and asymptotic normality of the MLE for the 
Bradley-Terry model under the condition that all the terms iV^ ^ are constant and the number of objects n 
increases. 

We conclude this section by noting the arguments and algorithms for facial set identification discussed in 
the appendix apply to this model as well. In this case, the marginal cone is spanned by a matrix of dimension 
((2) + ^ '^("- ~ 1)' the first (2) rows corresponding to the sampling constraints {x^j- + x^^i — Ni^ : i < j}, 
and the remaining n rows to the row sums. 

6.4 pi Models 

Both the beta model and the Bradley-Terry model can be obtained as special cases of the class of pi models 
for directed graphs proposed by Holland and Leinhardt (1981). In fact, existence of the MLE and the identi- 
fication of the facial sets for pi models can be treated using the very same arguments we have presented in 
the first part of the article. In this final section we detail these arguments for the more general and challeng- 
ing class of pi models. We remark that the asymptotic properties of pi models are largely unknown and, as 
discussed by Haberman (1981), such an analysis appears to be rather daunting. 

Just like in the other network models considered thus far, in pi models the occurrence of a random edge 
between any pair of nodes i and j, or dyad, is modeled independently from all the others edges. We keep 
track of four possible edge configurations within each dyad: node i has an outgoing edge into node j (z — >^ j); 
node i as an incoming edge originating from node j {i <— j); nodes i and j are linked by a bi-directed edge 
{i i — > j); and node i and j are not adjacent in the network. Following the notation we established in 
Petrovic et al. (2010), which is slightly different than the original notation of Holland and Leinhardt (1981), 
for every pair of nodes (i, j) we define the probability vector 

p,,, = (p.j(0,0),p„(l,0),p..,(0,l),p.,,(l,l)) e A3 (16) 

containing the probabilities of the four possible edge types, where A3 is the standard simplex in R*. The 
numbers pij (1,0), Pij{0, 1) andpij{l, 1) denote the probabilities of the edge configurations i j, i -s— j and 
i < — > j, respectively, and Pi,j{0, 0) is the probability that there is no edge between i and j (thus, 1 denotes 
the outgoing side of the edge). Notice that, by symmetry p^.j (a, fe) = pj.i(6, a), for all a, e {0, 1} and that 

P«j(0,0)+p,j(l,0)+p,,,(0,l)+p,.j(l,l) = 1. (17) 

In pi models, the (2) dyads are modeled as mutually independent draws from multinomial distributions 
with class probabilities pij, i < j. Specifically, the Holland-Leinhardt pi model specifies the multinomial 
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probabilities of each dyad (i, j) in logarithmic form as follows (see Holland and Leinhardt, 1981): 



log p,,, (0,0) 

log p,; J (1,0) 
log J (0,1) 
log p,; J (1,1) 




(18) 



The parameter quantifies the effect of an outgoing edge from node i, the parameter /3j instead measures 
the effect of an incoming edge into node j, while pij controls the added effect of reciprocated edges (in both 
directions) . The parameter 6 measures the propensity of the network to have edges and, therefore, controls 
the "density" of the graph. The parameters {A^ : i < j} are normalizing constants to ensure that (17) holds 
for each each dyad (i, j) and need not be estimated. Note that, in order for the model to be identifiable, 
additional linear constraints need to be imposed on its parameters. We refer the interested readers to the 
original paper on pi model by Holland and Leinhardt (1981) for an extensive interpretation of the model 
parameters. 

As noted in Fienberg and Wasserman (1981a,b), different variants of the pi model can be obtained by 
constraining the model parameters. In Petrovic et al. (2010) we consider three special cases of the basic pi 
model, which differ in the way the reciprocity parameter is modeled: 

1. pij = 0, no reciprocal effect; 

2. Pij = p, constant reciprocation; 

3. Pij — pj^ p- jf- pj^ edge-dependent reciprocation. 

As it is often the case with network data, we assume that data become available in the form of one 
observed network. Thus, each dyad is observed in only one of its four possible states and this one 
observation is a random vector in M** with a Multinomial(l,pi^j) distribution. As a result, data are sparse 
and, even though the dyadic probabilities are strictly positive according to the defining equations (18), only 
some of the model parameters may be estimated from the data. Extension to the case in which the dyads are 
observed multiple times are straightforward. 

For a network on n nodes, we represent the vector of 2n{n — 1) dyadic probabilities as 



where, for each i < j, pij is given as in (16). The pi model is the set of all probability distributions that 
satisfy the HoUand-Leinhardt equations (18). The design matrix associated with a given pi model can be 
constructed as follows (this construction is by no means unique and leads to rank-deficient matrices, though 
it is rather simple). The columns of A are indexed by the entries of the vectors pij, i < j, where the Pij's 
are ordered lexicographically, and its rows by the model parameters, ordered arbitrarily. The (r, c) entry of 
A is equal to the coefficient of the c-th parameter in the logarithmic expansion of the r-the probability as 
indicated in (18). In particular, notice that the entries of A can only be 0, 1 or 2. For example, in the case 
Pij = p + Pi + Pj, the matrix A has (2) + 3n + 2 rows. When n ~ 3, the design matrix corresponding to this 
model is 



P = (Pl2,Pl3, ■ • • ,Pn~l,,i 



) e R2n(n-l)^ 
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Let Sn = ^ j} C {0, i}2n(n-i) denote the sample space, i.e. the set of all observable networks on 

n nodes. Then, every point x in the sample space X can be written as 

X = (Xi,2,a;i,3, . . . ,Xn-l,n), 

where each of the (2) subvectors Xi,j is a vertex of A3. Notice that \Xn \ = 4"("~^\ This way of representing 
a network on n nodes with a highly-constrained 0/1 vector of dimension 2n{n — 1) may appear cumbersome 
and redundant. Indeed, as in Holland and Leinhardt (1981), we could more naturally represent an rt-node 
network using the n x n incidence matrix with 0/1 off-diagonal entries, where the (i, j) entry is 1 is there 
is an edge from i to j and otherwise. While this representation is more intuitive and parsimonious (as 
it only requires "^"^""^^ bits), whenever p ^ 0, the sufficient statistics for the reciprocity parameter are not 
linear functions of the observed network. As a consequence, the adjacency matrix representation does not 
lead directly to a linear exponential family. 

The convex support for this family is the polytope obtained as the Minkowski sum 



where Aij is the sub-matrix of A comprised by the four columns referring to the dyad Given an 

observed network x & Sn the MLE of the parameters exists if and only of Ax e ri(5„) and, when the MLE 
does not exist, the associated facial set provides the non-estimable probability parameters. Like with the 
pol5^ope of degree sequences for the beta model, the combinatorial complexity of this object is quite high 
and increases very rapidly with n (though, unlike the beta model, the convex supports for these models do 
not appear to be a known or well studied polytopes). See table 6.4 and the discussion below. 

The arguments and results of Section 3 and the Cayley trick described in the appendix apply to the case 
of pi models as well, and yield the following result. 

Theorem 6.5. For any pi model with associated design matrix A, the MLE exists if and only if Ax e ri(CA), 
where Ca = conc(A), and the facial sets of Pa are also facial sets o/Ca- 

As shown in the appendix and further illustrated in Table 6.4, it is algorithmically much simpler do deal 
with the cone Ca than with the polytope Fa- 



Numerical Experiments 

We conclude this section by describing some numerical experiments illustrating the reduction in complexity 
associated to the Cayley trick described in the appendix for the general pi model. Table 6.4 displays the 
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n 


Pi, J = 


Pi,j = P 


P = Pi+ Pj 


2n{7i - 1) 


3 


62 


62 


62 


12 


4 


1,862 


2,415 


3,086 


24 


5 


88,232 


158,072 


347,032 


40 



Table 13: Number of vertices for the polytopes Pa for different specifications of the pi model and different 
network sizes. Computations carried out using minksum Weibel (2005). The last column indicates the 
number of columns of the design matrix A, which correspond to the number of generators of Ca- 
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3,181 


25 
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94,337 


26 


28 


57,527 


31 


34 



Table 14: Number of facets, dimensions and ambient dimensions of the the cones Ca for different specifi- 
cations of the pi model and different network sizes. The number of facets of Ca is equal to the number of 
facets of Pa plus (2) , these additional facets corresponding to the sampling constraints of one observation 
per dyad. 



number of vertices of the polj^opes Pa for the three pi model specifications we consider and various net- 
works sizes. The last column of the table contains the number of columns of the design matrix, which is also 
the number of extreme rays of the marginal cone Ca- In comparison, the number of vertices of Pa, whose 
determination is computationally very hard, is very large and grows extremely fast with n. 

In Table 6.4 we report the number of facets, dimensions and ambient dimensions of the cones Ca for 
different values of n and for the three specification of the reciprocity parameters j we consider here. 
Though this only provides and indirect measure of the complexity of these models and of the non-zero 
patterns in extended MLEs, it does show how quickly the complexity of pi models may scale with the network 
size n. 

Another point of interest is the assessment of how often the existence of the MLE arises. In fact, because of 
the product Multinomial sampling constraint, nonexistence of the MLE is quite severe, especially for smaller 
networks. Below we report our findings, which are necessarily restricted to networks of small sizes. 

The case n ~ 3. The sample space consists of 4-^ = 64 possible networks. When pij = for all i and j, there 
are 63 different observable sufficient statistics, only one of which belongs to ri(PA). Thus, only one of the 63 
observable sufficient statistics leads to the existence of the MLE. This sufficient statistic corresponds to the 
two nextworks 



X 

1 




and 



X 


1 



In both cases, the associated MLE is the 12 -dimensional vector with entries all equal to 0.25. Incidentally, the 
polytope Pa has 62 vertices and 30 facets. When pi^j ^ p =/= or pij ~ pi + pj the MLE never exists. 

The case n = 4. The sample space contains 4096 observable networks. If pij = 0, there are 2,656 differ- 
ent observable sufficient statistics, only 64 of which yield existent MLEs. Overall, out of the 4096 possible 
networks, only 426 have MLEs. When pij = p 7^ 0, there are 3, 150 different observable sufficient statistics, 
only 48 of which yield existent MLEs. Overall, out of the 4, 096 possible networks, only 96 have MLEs. When 
p^ j = Pi + Pj, there are 3, 150 different observable sufficient statistics and the MLE never exists. 
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The case 71 = 5. The sample space consists of 4^° = 1,048,576 different networks. If pi,j = 0, there are 
225,025 different sufficient statistics, and the MLE exists for 7,983. If pi_j = p 7^ the number of distinct 
possible sufficient statistics is 349, 500, and the MLE exists in 12, 684 cases. Finally, when pij = pi+ pj, the 
number of different sufficient statistics is 583, 346 and the MLE never exists. 

7 Discussion and extensions 

We have used polyhedral geometry to analyze the conditions for existence of the MLE of a generalized ver- 
sion of the /3-model and to derive finite sample bounds for the probability associated with the existence of 
the MLE. Our results offer a novel and explicit characterization of the patterns of edge counts leading to 
non-existent MLEs. The problem of nonexistence occurs in numbers and with a complexity that was not 
previously known. Our results allow us to sharpen conditions for existence of the MLE. Our analysis in 
particular highlights the fact that requiring node degrees equal to and ?i — 1 is only a sufficient condition 
for nonexistence of the MLE and non-estimability of the edge probabilities. We show that we need to ac- 
count for many more edge patterns. We note that the use of polyhedral geometry in statistical models for 
discrete data is a hallmark of the theory of exponential families, but its considerable potential for use and 
applications in the analysis of log-linear and network models has only recently begun to be investigated (see 
Fienberg and Rinaldo, 2011; Rinaldo et al., 2009). 

Our generalization of the /?-model allows for Poisson and binomial, not simply Bernoulli distributions 
for edges. Email databases and others involving repeated transactions among pairs of parties provides 
the simplest examples of situations for networks where edges can occur multiple times. These are often 
analyzed as weighted networks but that may not necessarily make as much sense as using a Poisson for 
random numbers of occurrences. 

As our results indicate, the nonexistence of the MLE is equivalent to non-estimability of a subset of the 
parameters of the model, but by no means does it imply that no statistical inference can take place. In 
fact, when the MLE does not exist, there always exists a "restricted" /3-model that is fully specified by the 
appropriate facial set, and for which all parameters are fully estimable. Thus, for such smaller model, tradi- 
tional statistical tasks such as hypothesis testing and assessment of parameter uncertainty are possible, even 
though it becomes necessary to adjust the number of degrees of freedom for the non-estimable parameters. 
A complete description of this approach, which is rooted in the theory of extended exponential families, is 
beyond the scope of the article. See Fienberg and Rinaldo (2011) for details. 

We can extend our study of the /3-model in a number of ways. In the full original version of the article 
we considered various generalizations of the /3-model setting, including the /3-model with random numbers 
of edges, the Rasch model from item response theory, the Bradley-Terry paired comparisons model and the 
Pi network model. For most of these models we were able to carry out a fairly explicit analysis based 
on the underlying geometry, but for the full pi model the complexity of the model polytope appears to 
make such a direct analysis very difficult (this is reflected in the high complexity of the Markov basis for pi 
model, of which we give full account in Petrovic et al., 2010). Another interesting extension of our results of 
Section 4 would be to translate our conditions, which are formulated in terms of expected degree sequences, 
into conditions on the Pij's themselves, for instance by establishing appropriate bounds for minj<j Pij, 
maxi<j pi^j, or maxi^j T^^- 

We conclude with some remarks on the computational aspects of our analysis, which constitute a non- 
trivial component of our work and is of key importance for detecting the nonexistence of the MLE and 
identifying estimable parameters. The main difficulty in applying our results is that the polytope of degree 
sequences P„ is difficult to handle algorithmically in general. Indeed, P„ arises a Minkowksi sum and, even 
though the system of defining inequalities is given explicitly, its combinatorial complexity grows exponen- 
tially in 71. More importantly, the vertices of Pn are not known explicitly. Algorithms for obtaining the 
vertices of P„, such as minksiim (see Weibel, 2005), are computationally expensive and require generating 
all the points {Ax,x € Gn}, where \Qn\ = 2(2), a task that, even for n as small as 10, is impractical. See 
for instance, our analysis of the pi model in the full version of the article. Thus, deciding whether a given 
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degree sequence is a point in the interior of P„ and identifying the facial set corresponding to an observed 
degree sequence on its boundary is highly non-trivial. Our strategy to overcome these problems entails 
re-expressing the /3-model as a log-linear model with (2) product-multinomial sampling constraints. This 
approach is not new, and it harks back to the earlier re-expression of the HoUand-Leinhardt pi model and 
its natural generahzations as log-linear models (Fienberg and Wasserman, 1981a,b; Fienberg et al., 1985; 
Meyer, 1983). Though this re-parametrization increases the dimensionality of the problem, it nonetheless 
has the crucial computational advantage of reducing the determination of the facial sets of Pn to the deter- 
mination of the facial sets of a pointed polyhedral cone spanned by n(r7, — 1) vectors, which is a much simpler 
object to analyze, both theoretically and algorithmically. This procedure is known as the Cayley embedding 
in polyhedral geometry, and Fienberg and Rinaldo (2011) describe its use in the analysis of log-linear mod- 
els. The advantages of this re-parametrization are two-fold. First, it allows us to use the highly optimized 
algorithms available in polymake for listing explicitly all the facial sets of P„. This is how we computed the 
facial sets in all the examples presented in this article. Secondly, the general algorithms for detecting nonex- 
istence of the MLE and identifying facial sets proposed in Fienberg and Rinaldo (2011), which can handle 
larger dimensional models (with n in the order of hundreds), can be directly applied to this problem. This 
reference is also relevant for dealing with inference under a non-existent MLE. 

Software 

The R routines used to carry out the computations for the results presented in the paper and for creating the 

input files for polymake are available at http : //www. stat . emu. edu/~arinaldo/Rinaldo_Petrovic_Fienberg_Rcode . txt 
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9 Proofs 

Proof of Theorem 3. 1 . Throughout the proof, we will use standard results and terminology from the theory 
of exponential families, for which standard references are Brown (1986) and Barndorff-Nielsen (1978). The 
polj^ope 

Sn ■= convhuU {{Ax, x G 

is the convex support for the sufficient statistics of the natural exponential family described in Section 2. 
Furthermore, by a fundamental result in the theory of exponential families (see, e.g.. Theorem 9.13 in 
Barndorff-Nielsen, 1978), the MLE of the natural parameter /? € M" (or, equivalently of the set probabilities 
{Pi^j^i < j} e m'-^-' satisfying (1)) exists if and only if d G int(S'„). Thus, it is sufficient to show that 
d G int(S'„) if and only if d G int(P„). 

Denote with a^j the column of A corresponding to the ordered pair with i < j, and set 

Pjj = convhull{0, fljj} C M". (19) 

Each Pij is a line segment between its vertices and a^.j. Then, P„ can be expressed as the zonotope 
obtained as the Minkowski sum of the line segments Pij: 
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This identity can be established as follows. On one hand, P„ is the convex hull of vectors that are Boolean 
combinations of the columns of A. Since all such combinations are in J2i<j Pi.j^ both F„ and X]i<j Pi,j 
are closed sets, we obtain P„ C X]j<j ^i-j- the other hand, the vertices of X]i<j ^'^^ ^1^° Boolean 
combinations of the columns of A (see, e.g., corollary 2.2 in Fukuda, 2004), and, therefore, Yl,i<j ^i-j — ^n- 
Equation (20) shows, in particular, that d £ Pn- Furthermore, using the same arguments, we see that, 
similarly to P„, Sn too can be expressed as a Minkowski sum: 

where 

is the rescaling of Pij by a factor of Ntj. In fact, we will prove that Sn and P„ are combinatorially equivalent. 

For a polytope P and a vector c, we set F{P; c) := {x G P: x'^c > y^c,Wy S P}. Any face F of P can 
be written in this way, where is c is any vector in the interior of the normal cone to F. By Proposition 2.1 in 
Fukuda (2004), F is a face of P„ with F = P(P„, c) if and only if it can be written uniquely as 

P(P„,c) =^P(P,.,,c), 

i<j 

for any c in the interior of the normal cone to F. It is immediate to see that F{Pi_j,c) is a face of Pij if and 
only if F{Si_j , c) is a face of Si_j, and that F{Si_j , c) = NijF{Pij , c); in fact, Pi j and 5^.^ are combinatorially 
equivalent. Therefore, invoking again Proposition 2.1 in Fukuda (2004), we conclude that F{Pi,j,c) is a face 
of P„ if and only if 

^iV,.,P(P,,„c) 

is a face of Sn (and this representation is unique). From this, we see that P„ and 5„ have the same normal 
fan and, therefore, are combinatorially equivalent. ■ 

Proof of Lemma 3.2. By Proposition 2.1 in Fukuda (2004), 

P = P(P„,c) =^P(P,,,,c), (21) 

for any c in the interior of the normal cone to F, where the above representation is unique. Since Pij is a 
line segment (see (19)), its only proper faces are the vertices and aij. Let the set J" be the complement 
of the set of pairs with i < j such that F{Pij,c) is either the vector or ai.j. By the uniqueness of 
the representation (21), J" is unique as well and, in particular, maximal. Furthermore, as it depends on F 
only through the interior of its normal cone and since the interiors of the normal cones of P„ are disjoint, 
different faces will be associated with different facial sets. ■ 

Proof of Theorem 4.1. Let d = (di, . . . , d„) be the random vector defined in (6). We will show that, under 
the stated assumptions, d G int(P„) with probability no smaller than 1 — ^^i^-i ■ 

Since iV„ is constant, we can conveniently re-express the random vector d as an average of independent 
and identically distributed graphical degree sequences. In details, we can write 

J^lfdC^), (22) 
fc=i 

where each d^*^) is the degree sequence arising from of an independent realization of random graph with 
edge probabilities {pij : i < j} , for k — I, . . . , N . 
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Thus, each di is the sum of N{n — 1) independent random variables taking values in {0, j^}. Then, an 
application of Hoeffding's inequality and of the union bound yields that the event 



0„ I max | - d, \ < ^ c"^^ | (23) 

occurs with probability at least 1 — ^^ic-i ■ Throughout the rest of the proof we will assume that the event On 
holds. 

By assumption (i), for each i, 



< ^ + V'^^ - ' T'^r - - + V <n-l-C- \jc^^ < n - 1, 
so that 

< Ji < n - 1, i = 1, . . . , 71. (24) 

Notice that the assumed constraint on the range of C guarantees the above inequalities are well defined. 
Next, for each pair {S, T) e V, 

\g{S,T,d,n)^ g{S,T,d,n)\ < \S U T\max\d^ -di\, 

i 

which yields 

g{S, T, d, n) > giS, T,d,n)^\SU T\^jc^^. 

Using assumption (ii), the previous displays implies that 

min g{S, T,d,n)>C> 0. (25) 
(s.T)ev' 

Thus, we have shown that (24) and (25) hold, provided that the event 0„ is true and assuming (i) and (ii). 
Therefore, by Theorem 3.3 the MLE exists. 

■ 

Proof of Corollary 4.2. Using the same setting and notation of Theorem 4.1, we will assume throughout the 
proof that the event 

O'n := < max max jd-'^^ — di\ < cn log i 
holds true. Note that by Hoeffding's inequality and the union bound, 

2 

') <2 cxp {-2c log 71 + log n + log N} < 



where we have used the inequality log N < log n. A simple calculation shows that, when O'^ is satisfied, we 
also have 

|max \di — di\ < \J cn log n| . 
Then, by the same arguments used in the proof of Theorem 4.1, assumption (i') jdelds that 

0<di<n-l, i = l,...,n. (26) 

and, for each pair (5, T) e V, 



g{S,T,d,n) > g{S,T,d,n) - \S U T\y/cn\ogn. (27) 
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Now, it is easy to see that, on the event 0',^, assumption (V) also yields 

min min min ^df^^ ,n ~ 1 — df^ | > \/ cn log n + C. (28) 

We now show that, when (26) and the previous equation are satisfied, the MLE exists if 

min g(S,T,d,n) > C > 0. (29) 

Indeed, suppose that (26) is true and that d belongs to the boundary of P„. Then, by the integrality of the 
polytope Pn, there exist non-empty and disjoint subsets T and S of {!,... ,n} satisfying the conditions of 
lemma 3.4 for each of the degree sequences d^'^^ If minj; mini d-*^^ > ^/cnlogn+C, then, necessarily, 

l^l > VoT- log n + C, because \S\ is the maximal degree of every node i <E T. Similarly, since each i e S has 
degree at least |5| - 1 + [(S" U T)'=|, if maxj, max^ < ti — 1 — ^/cn log n — C, the inequality 

\S\-l + \{S\JTY\ <n-l- v/cnlogn-C 

must hold, implying that |T| = n - \S\ -\{S\J Tf] > y/cn log n + C. Thus, we have shown that, if (26) and 
(28) hold, and d belongs to the boundary of Pn, the cardinalities of the sets S and T defining the facet of Pn 
to which d belongs cannot be smaller than -^cn log n + C. By Theorem 3.3, when (26) and (28) hold, (29) 
implies that d e int(P„), so the MLE exists. However, equation (27) and assumption (ii') implies (29), so the 
proof is complete. ■ 

Proof of Theorem 6.3. The result about existence of the MLE follows from a direct application of Theo- 
rem 9.13 in Barndorff-Nielsen (1978) or Theorem 5.5 in Brown (1986), since C„ is the convex support 
for the exponential family of equation 13. 

As for the claims regarding the facets of C„, since the row span of A contains the constant vectors, we 
study the facets of the polytope P := co\\y{B) c M" x M". Denote by Xi and a; - the coordinates of the two 
spaces, and by and e- the corresponding standard unit vectors in R". The polytope P is contained in the 
product of simplices A„_i x A„_i := convjei x : 1 < i, j < n\, where, for two vectors x and x' in M", 

The point x e'j corresponds to the (i, j)-entry of the n x n incidence table of the network. P is obtained 
from the product of simplices by removing the n vertices {e^ x e - : i = 1 . . . , n}. To show that P has 3n facets, 
we will use the fact that A„_i x A„_i has 2n facets whose defining inequalities are Xi > 0, > 0, for 
i = 1 . . . ,n. Note that these facets correspond to zero margins in the incidence table: for example, Xi = 
refers to the zero margin corresponding to the j-th row and a; ■ = to the zero margin for the (t + 77.)-th row. 
Define a new polytope, P', cut out by the following 3n inequalities: 

P' {.T, > 0, x'^ > 0, X., + x'^ < 1, for all i}. 

We need to show that P = P' and that the defining inequalities are all facets. For the first claim, we 
already see that P C P' . Since A„_i x A„_i is simple, every vertex has dimension many neighbors. 
Thus, removing the vertex x e- introduces one new facet, namely, xi + x[ < 1. Since we are removing 
n non-adjacent vertices, P = P'. Next, our arguments so far already imply that the n new inequalities 
{xi + x'^ < 1: i = 1, . . . ,n} define facets, so we need to show that other 2n inequalities, corresponding to 
zero row margins, define facets as well. But this follows from the fact that the support sets of each of the 
rows of A are facial sets of P and that they are incomparable, in the sense that none of them is contained in 
any of the others. Thus, since the lattice of facial sets of P is isomorphic to the face lattice of P, the 2n null 
margins each specifies a different facet of P. ■ 



XXX 
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10 Appendix: Computations 



In this appendix, we provide details on how to determine whether a given degree sequence belongs to the 
interior of the polytope of degree sequences P„ and on how to compute the facial set corresponding to a 
degree sequence on the boundary of P„. We will only deal with the polj^ope P„, even though the arguments 
below are general and extend, for instance, to the Rasch model, the Bradley-Terry model and pi models. 

Below, we describe the procedure we used to compute the facial sets of P„. The main difficulties with 
working directly with P„ is that this polytope arises a Minkowksi sum and, even though the system of 
defining inequalities is given explicitly, its combinatorial complexity grows exponentially in n. Furthermore, 
we do not have available a set of vertices for P„. Algorithms for obtaining the vertices of P„, such as minksum 
(see Weibel, 2005), are computationally expensive and require generating all the points {Ax, x £ Gn}, where 
. In general, when n is as small as 10, this is not feasible. 

Our basic strategy to overcome these problems is quite simple, and entails representing the beta model 
as a log-linear model with (2) product-multinomial sampling constraints. Though this re-parametrization 
increases the dimensionality of the problem, it nonetheless has the crucial computational advantage of reduc- 
ing the determination of the facial sets of P„ to the determination of the facial sets of a pointed polyhedral 
cone spanned by n{n — 1) vectors, which is a much simpler object to analyze, both theoretically and algorith- 
mically. This procedure is known as the Cayley embedding in polyhedral geometry, and its use in the analysis 
of log-linear models is described in Fienberg and Rinaldo (2011). The advantages of this re-parametrization 
are two-fold. First, it allows us to use the highly optimized algorithms available in polymke for listing ex- 
plicitly all the facial sets of P„, which is the strategy we used. Secondly, the general algorithms for detecting 
nonexistence of the MLE and identifying facial sets proposed in Fienberg and Rinaldo (2011), which can 
handle larger dimensional models, can be directly applied to this problem. This reference is also relevant for 
dealing with inference under a non-existent MLE. 

In the interest of space, we do not provide all the details, and instead only sketch the two main steps of 
our procedure. 

• Step 1 : Enlarging the space 

In the first step, we switch to a redundant representation of the data by considering all the observed 
counts {xij,i ^ j} and not just {xij,i < j}. We index the points of this enlarged set of n{n — 1) 
numbers as pairs 5^ = {{xij,Xj^i): i < j} C N"^"^^^, with the pairs ordered lexicographically based 
on (z, j). For instance, when n = A, any point x' g ^4 has coordinates indexed by 

(1,2), (2, 1), (1, 3), (3, 1), (1, 4), (4, 1), (2, 3), (3, 2), (2, 4), (4, 2), (3, 4), (4, 3). 

It is clear that the sets 5„ and S'^ are in one-to-one correspondence with each other and that, for each 
corresponding pair x G 5„ and x' E S!^, x'^ j = Xij for all i < j and = N^^j - Xi^ for all j > i. 

In this new setting, we construct a new polytope P/^ c K^" that is combinatorially equivalent to P„ but 
whose facial sets are easier to interpret. This is achieved by first constructing a new design matrix B 
of dimension (2n) x n{n — 1), with the columns indexed according to the order described above. The 
matrix B has the form 

where both Bi and B2 have n rows. For all i < j, the columns of Bi corresponding to the coordinate 
and the columns of B2 corresponding to the coordinate are both equal to aij, and all the 
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other columns are zeros. For instance, when n = 4, 



B 



101010000000 
100000101000 
001000100010 
000010001010 
010101000000 
010000010100 
000100010001 
000001000101 

By construction, d ~ Ax ~ Bix' for any corresponding pair x e 5„ and x' G S'^. Furthermore, if we 
let d' = B2X', it is easy to see that d' and d are in one-to-one correspondence with each other. Indeed, 
recalling that iV, 



iV. 



d' 



■'3>i ^i,3 



where we used equation (5) in the last step. Thus, Bx' is also a sufficient statistic, though highly 
redundant due to linear dependencies. Next, for any i < j, let 



B 



convhull({&i bjj}) 



where bij is the column of B indexed by (i, j), and set 

i<3 

The polj^opes P„ and are combinatorially equivalent, even though their ambient dimensions are 
different. In fact, using arguments similar to the ones used in the proof of Lemma 33.2, one can 
characterize the facial sets of as follows. 

Lemma 10.1. A point y' belongs to the interior of some face F' of if and only if there exists a set 
T' C {(i, j), i 7^ .?} such that 



Bp', 



(31) 



where -p' = {p-^ : i ^ j,p^j e [0, l],p'(.j = 1 -Pj.i} is such thatp'^ j = for all ^ T' andp'^ ^ > 0/or 
all G J". The set T is uniquely determined by the face F and is a maximal set for which (31) holds. 



Because P„ and are combinatorially equivalent, their co-facial sets are also in one-to-one corre- 
spondence. The advantage of using P,' instead of P„ is that its co-facial sets arise by entries of p' that 
are all zeros, as opposed to the more complicated co-facial sets of P„, which are obtained from entries 
of p = {pij : i < j} which are both ones and zeros. For instance, the co-facial set of P„ corresponding 
to the counts reported in Table 1 is {(1,2), (3,4)} with pi.2 = and p3_4 = 1. In contrast, the corre- 
sponding co-facial set for P^ is {(1, 2), (4, 3)}, with p'^^ = and p'^^ = 0. Clearly, they convey the same 
information. 

• Step 2: Lifting 

As we saw, the advantage of the larger polytope P'^ derived in the first step is that, when searching 
for co-facial sets, it is enough to consider points of the form p' = {p'ij- i 7^ :hPij [Oi 1]} with 
zero coordinates only. However, P^ is still a hard object to deal with computationally, since it is 
prescribed as a Minkowski sum of (2) polytopes. In this second step, we lift P,j to a polyhedral cone 
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in dimension 2?i + (2) which is simpler to analyze (in fact, as remarked below, this polyhedral cone 
has smaller dimension: n + (")). This cone is spanned by the columns of a matrix C of dimension 
(2?i + (2) ) X n{n — 1) which has the form 



C 



Ci 
B 



where the rows of Ci are indexed by the pairs {(i, j): i < j} ordered lexicographically. Each row (i, j) 
of Ci contains all zeroes, except for two ones in the coordinates and (j, i). In fact for any x' G 5„, 
the vector Cia;' is constant, and its (i, j)-the entry is 



For instance, when n — 4, 



C = 



110000000000 
001100000000 
000011000000 
000000110000 
000000001100 
000000000011 
101010000000 
100000101000 
001000100010 
000010001010 
010101000000 
010000010100 
000100010001 
000001000101 



Let Dn = conc(C) be the polyhedral cone of spanned by the columns of C. The facial sets of D„ are 
defined as follows (see, e.g., Geiger et al., 2006). The subset J" c : i 7^ j} is a facial set of Z)„ 

when there exists awe M^"+(2) such that 

{v,Ci^j)~Q, Vi G J" and (u, c,j)<0, Vi ^ J", 

where Ci.j indicates the column of C indexed by the pair (i, j). It follows that F is face of £>„ if and only 
if F = conc({ci : i G J^}), for some facial set F of and that there is a one-to-one correspondence 
between the facial sets and the faces of Thus, as before, facial sets form a lattice isomorphic to the 
face lattice of £>„. Following Eriksson et al. (2006), we will call £)„ the marginal cone. 

The following result shows how one can obtain the facial sets of P„ from the facial set of £)„ through 
the facial sets of P'^ (see also Section 3 in Fienberg and Rinaldo (2011)). 

Theorem 10.2. Let p' ^ {p^ - i ^ j,p[^^ G [0,1], p-j = 1 - p'j,,]- Then Bp' G ri(P/J if and only if 
Cp' G ri(D„). Furthermore, if is a facial set of P^, then J"' is a facial set of Dn. 

Proof We first define a new polytope Qn C which is combinatorially equivalent to P^[ and, 

therefore, to the polytope of degree sequences P„. Let c^.j be the column of C index by the pair (i, j) 
and, for each i < j, set 

Cij := convhull({c,.j , Cj,i}) 

and 

i<j 
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By construction, w € P,' if and only if 




where 1 e K^^) is a vector of all ones, which shows that P'^ and Q„ are combinatorially equivalent, 
so they have the same facial sets. We make a simple but useful observation: because the first (2) 
coordinates of any point in Qn are all ones, and given the patter of non-zero entries in the first (2) 
rows of C, it must be that if y € Qn and y = Cp', the vector p' is of the form {p- j'i^ j,Pi j & 
[0,1],<^ = 1-P^J. 

Since Q„ c -D„ and both sets are closed, y e ri(Q„) implies that y e ri(Z)„). As for the converse 
statement, suppose y belongs to the interior of a proper face of Q„ with facial set T'. Then, by 
Proposition 2.1 in Fukuda (2004), y can be uniquely expressed as 

y = VI.2 + yi,3 + • • ■ + J/n-l.n (32) 

where y^j- G ri(Ci^j) if and only if and (j, i) are in F' . Equivalently, yij = dj or yij = Cj,^ if 
and only if ^ T' or (j, i) ^ J"', respectively. Arguing by contradiction, suppose that y € ri(D„). 
Then, there exists a point p* = {p* j : i ^ j} with strictly positive entries such that y = Cp*. By the 
observation above, it must be that p* j e (0, 1) and p* j = 1 — p* j, for all i < j. In turn, this implies 
that, in equation (32), yij e ri(Cij) for all i < j, i.e. yi,j ^ {cij , Cj,i} for all i < j. Then, using again 
Proposition 2.1 in Fukuda (2004), y e n{Qn), a contradiction. 

To prove the second claim, notice that, the arguments so far yield that, for every proper face F of Qn, 
there exists one face G of Dn such that ri(P) c ri(G), so that J^' C Q, where J^' and Q are the facial 
sets associated with F and G, respectively. We now show that F' = Q. To see this, let y e ii(P) for 
some face F of Qn with facial set F' , so that 

y = Gp' 

for some p' = {p- : i ^ j,Pij G [0, M^P'ij = 1 ^ P'],i} such that p-^ > if and only if (i, j) e F'. On 
the other hand, since y g ri(G), 

y = Cp*, 

where p* = {p* ^ : p*^ > 0} is such that p* j > if and only if (i, j) G G- However, using the observation 
above, it must be that p* j G [0, 1] and p* j — 1 — p* j, for all i < j. By maximality of the facial sets, 
F' = Q, as claimed. 

Thus, we have shown that if F' is a facial set of Qn and hence of P^, it is also a facial set of ■ 



In particular, the only facial sets of £>„ that are not facial sets of are the ones corresponding to 
the supports of the first (2) rows of C, so that £>„ has (2) more facets than P„ (and P^). Since, by 
construction x[ j + Xj^i = Nij, Cx' will never be a point in the interior of the (2) facets of D whose 
facial sets are the supports of the first (2) rows of C. 

Theorem 10.2 can be used as follows. The MLE exists if and only if Cx' e ri(£)„). When the MLE does 
not exist, the corresponding facial set of £)„ gives the required facial set for P^ and, therefore, for P„. 

Finally, it is clear to see that C is rank-deficient due to linear dependencies among the rows, so one 
could instead consider the marginal cone spanned by the columns of the matrix 

( b! ) • 

which has full dimension (2) + n and is combinatorially equivalent to £>„. 
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The final result of the two-step procedure just outlined is a reparametrization of the beta model in the 
form of a log-linear model with full-rank design matrix given in (33) and Poisson sampling scheme. The 
constrains on the number of observed edges translate into (2) product-multinomial sampling restrictions for 
this log-linear model. However, it is well known that the conditions for existence of the MLE are the same 
under Poisson and product-multinomial scheme, so whether we incorporate these constraints or not has 
no bearing on parameter estimability. See Haberman (1974, Chapter 2) and Fienberg and Rinaldo (2011, 
Section 3.4). 

The examples of co-facial sets were obtained by first computing the matrix (33) and then using polymake 
to compute the facial sets of the resulting marginal cone. For a detailed description of the connection with 
log-linear models, and for algorithms to compute the facial sets of this cone that can be used in higher 
dimensions, the reader is referred to Fienberg and Rinaldo (2011). 

Finally, to deal with the Rasch model, the procedure can be trivially modified by eliminating the columns 
of A and, in particular, of C corresponding to all the edges between the sets I and J comprising the bipar- 
tition of the node set. In particular, the resulting matrix C has dimension {kl + k + I) x 2kl and has rank 
kl + k + I — \, where k and I are the cardinalities of / and J, respectively. 

Algorithms 

We first indicate how the non-existence of the MLE and the determination of the appropriate facial set can be 
addressed using simple linear programming. While checking for the existence of the MLE is immediate, the 
second task is more demanding. In order to decide whether the MLE exists it is sufficient to establish whether 
the observed sufficient statistics A.t belong to the relative interior of Pn, which, by Theorem 10.2, happens if 
and only if t := Cx' belongs to the relative interior of £)„, where for convenience the matrix C can be taken 
to be as in (33) (so it has dimension " + (2) x n(n — 1) and is of full rank). In turn, we can decide this by 
solving the following simple linear program 

max s 
s.t. Cx' = t 

s > 0, 

where the scalar s and vector x' = {.t- j , « 7^ j} G M"("^i) are the variables. At the optimum [s* ,x*), the MLE 
exists if and only if s* > 0. Though very simple, the previous algorithm may not be sufficient to compute 
the support of p if the MLE does not exist. To this end, we need to resort to a more sophisticated algorithm. 
Consider the following n{n — 1) programs, one for each column of C: 

max(cij,2/) 
s.t. y^t = 
C^y > 
-1 <2/< 1, 

where the last inequalities are taken element-wise. Let e IR"+(S) denote the solution to the linear 
program corresponding to the {i,j)-th column of C. 

Lemma 10.3. The MLE does not exist if and only if {cij,y*i,j) > 0/or some in which case the co-facial 
set associated with t is given by 

{(hj) - {c^,3,ylJ) > 0}. 

Proof Let J" = {(i, j) : {ci,j,y*j) =0}. If J" = {1, n}, then there does not exist any vector w € R"^^^) 
such that {v,Ci_j) > with strict inequality for some Thus, the normal cone at t is the zero vector, 

so t e ri(I?„), and the MLE exists by Theorem 10.2. We now show that the if the MLE does not exist, then 
T ~ T, where T is the facial set associated with the face of £)„ whose relative interior contains t. To see 
this, let V = j)G.F y*i,j- clear that C J", for otherwise the vector v would produce a strictly larger 
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facial set, which violates the maximality of T . On the other hand, if j) G T\ T , then there does not exist 
any vector y* in the feasible set of the (i, j)-th program such that {y*j,Ci,j) — 0. However, the vector v 
specifying J" is clearly in that feasible set and, by definition, {v, Cij) — 0, which gives a contradiction. Thus 
= J", as claimed. ■ 

See Fienberg and Rinaldo (2011, Section 4.1) for a more refined and efficient implementation of the 
above algorithms. 
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